Computational Method for Annotation of Protein Sequence According to Gene Ontology Terms
نویسندگان
چکیده
Annotation of a protein sequence is pivotal for the understanding of its function. Accuracy of manual annotation provided by curators is still questionable by having lesser evidence strength and yet a hard task and time consuming. A number of computational methods including tools have been developed to tackle this challenging task. However, they require high-cost hardware, are difficult to be setup by the bioscientists, or depend on time intensive and blind sequence similarity search like Basic Local Alignment Search Tool. This paper introduces a new method of assigning highly correlated Gene Ontology terms of annotated protein sequences to partially annotated or newly discovered protein sequences. This method is fully based on Gene Ontology data and annotations. Two problems had been identified to achieve this method. The first problem relates to splitting the single monolithic Gene Ontology RDF/XML file into a set of smaller files that can be easy to assess and process. Thus, these files can be enriched with protein sequences and Inferred from Electronic Annotation evidence associations. The second problem involves searching for a set of semantically similar Gene Ontology terms to a given query. The details of macro and micro problems involved and their solutions including objective of this study are described. This paper also describes the protein sequence annotation and the Gene Ontology. The methodology of this study and Gene Ontology based protein sequence annotation tool namely extended UTMGO is presented. Furthermore, its basic version which is a Gene Ontology browser that is based on semantic similarity search is also introduced. Keywords—Automatic clustering, Bioinformatics tool, Gene Ontology, Protein sequence annotation, Semantic similarity search.
منابع مشابه
UTMGO: A Tool for Searching a Group of Semantically Related Gene Ontology Terms and Application to Annotation of Anonymous Protein Sequence
Gene Ontology terms have been actively used to annotate various protein sets. SWISS-PROT, TrEMBL, and InterPro are protein databases that are annotated according to the Gene Ontology terms. However, direct implementation of the Gene Ontology terms for annotation of anonymous protein sequences is not easy, especially for species not commonly represented in biological databases. UTMGO is develope...
متن کاملCESSM : Collaborative Evaluation of Semantic Similarity Measures
The application of semantic similarity measures to proteins annotated with Gene Ontology terms has become a common method in bioinformatics. However, the evaluation of these measures is still challenging, since no common standard of evaluation exists. We present an online tool for the automated evaluation of GO-based semantic similarity measures, CESSM, that enables the comparison of new measur...
متن کاملHierarchical Classification of Gene Ontology Terms Using the Gostruct Method
Protein function prediction is an active area of research in bioinformatics. Yet, the transfer of annotation on the basis of sequence or structural similarity remains widely used as an annotation method. Most of today's machine learning approaches reduce the problem to a collection of binary classification problems: whether a protein performs a particular function, sometimes with a post-process...
متن کاملEnhanced Sequence-Based Function Prediction Methods and Application to Functional Similarity Networks
After reviewing the underlying framework required for computational function prediction in the previous chapter, we discuss two advanced sequencebased function prediction methods developed in our group, namely the Protein Function Prediction (PFP) method and the Extended Similarity Group (ESG) method. PFP extends the traditional homology search by incorporating functional associations between p...
متن کاملIdentification and prioritization genes related to Hypercholesterolemia QTLs using gene ontology and protein interaction networks
Gene identification represents the first step to a better understanding of the physiological role of the underlying protein and disease pathways, which in turn serves as a starting point for developing therapeutic interventions. Familial hypercholesterolemia is a hereditary metabolic disorder characterized by high low-density lipoprotein cholesterol levels. Hypercholesterolemia is a quantitativ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2012